library(rgeos)
library(rgdal)
library(sp)
library(maptools)
library(spatstat)
library(spdep)
library(INLA)
library(inlabru)
library(readxl)
library(lubridate)
library(ggmap)
library(raster)
Next, lets check our current working directory!
getwd()
## [1] "/Users/joe/ownCloud/DFO_Workshop_2020/DFO_SDM_Workshop_2020"
We should be inside the folder titled: ‘DFO_SDM_Workshop_2020’. If not, change this by navigating to the correct folder in the bottom right panel (folder view), opening it, and then clicking “Set as Working Directory” under the tab ‘More’.
Now we can load in the precompiled data, models, and functions.
load('./Data/Modelling_Data2.RData')
fit <- readRDS('./Model_Files/fit.rds')
fit2 <- readRDS('./Model_Files/fit2.rds')
source('utility_functions.R')
Finally, let’s turn off all warnings associated with coordinate reference systems.
rgdal::set_rgdal_show_exportToProj4_warnings(FALSE)
rgdal::set_thin_PROJ6_warnings(TRUE)
options("rgdal_show_exportToProj4_warnings"="none")
In this final session, we will learn how to jointly model presence-only opportunistic sightings with survey data within the inlabru package. We will demonstrate how to fit the models and predict both the spatial whale intensity and population size.
For the whale watch data, we have little knowledge on the activities of the whale watch vessels and no GPS data is available. Throughout the remainder of this section, we will assume that the efforts from the two whale watch companies, as captured in the two whale watch effort surfaces \(\lambda_{eff}\), will be a function of the distance from the two ports. Crucially, we will assume that the effort decreases as a function of distance from the port.
Two pieces of knowledge available to us, are the number of daily whale watch trips that operate from each port and the average trip length from each port. These can be found from the websites of the two whale watch operators. These two pieces of information can help us to sensibly constrain the model, hopefully giving us more realistic predictions and inference. We will compare the effects of including and excluding this additional knowledge.
First, we buffer and rescale the two SpatialPixelsDataFrames Dist_Brier and Dist_Quoddy. These contain the distances from the Brier and Quoddy ports respectively. Unlike with previous covariates, however, we will fix all bufffered values on land equal to a large constant. This will help to ensure that negligible effort is recorded from the whale watch vessels on land through \(\lambda_{eff}\).
ggplot() + gg(Dist_Brier) + gg(mesh_land)
# Buffer
pixels_Domain <- pixels( mesh_land,mask=gSimplify(gBuffer(Domain,width=19),tol=3),
nx=400,ny=400)
pixels_Domain$Dist_Brier <- over(pixels_Domain,Dist_Brier)$Dist_Brier
pixels_Domain$Dist_Brier[is.na(pixels_Domain$Dist_Brier)] <- 1e3
Dist_Brier <- pixels_Domain
ggplot() + gg(Dist_Brier) + gg(mesh_land)
# There is an infinity value at the port. Change to 0
Dist_Brier$Dist_Brier[is.infinite(Dist_Brier$Dist_Brier)] <- 0
max(Dist_Brier$Dist_Brier)
## [1] 1000
# Let's scale the Dist covariates closer to the (0,1) scale
Dist_Brier$Dist_Brier <- Dist_Brier$Dist_Brier / 980.7996
# repeat for distance from Quoddy port
pixels_Domain <- pixels( mesh_land,mask=gSimplify(gBuffer(Domain,width=19),tol=3),
nx=400,ny=400)
pixels_Domain$Dist_Quoddy <- over(pixels_Domain,Dist_Quoddy)$Dist_Quoddy
pixels_Domain$Dist_Quoddy[is.na(pixels_Domain$Dist_Quoddy)] <- 1e3
Dist_Quoddy <- pixels_Domain
# There is an infinity value at the port. Change to 0
Dist_Quoddy$Dist_Quoddy[is.infinite(Dist_Quoddy$Dist_Quoddy)] <- 0
# Let's scale the Dist covariates closer to the (0,1) scale
Dist_Quoddy$Dist_Quoddy <- Dist_Quoddy$Dist_Quoddy / 980.7996
ggplot() + gg(Dist_Quoddy) + gg(mesh_land)
Now, we investigate a potential functional form for the distance from port covariate. To do this, we plot a histogram of the distances from port at each of the whale watch sighting locations.
# Plot the sightings with distance from the port
hist(over(Sightings_Brier_nodup,Dist_Brier)$Dist_Brier, main='Histogram of the distance from port of the Brier sightings', xlab = 'Distance from port')
hist(over(Sightings_Quoddy_nodup,Dist_Quoddy)$Dist_Quoddy,breaks=20, main='Histogram of the distance from port of the Quoddy sightings', xlab = 'Distance from port')
For both ports, we detect a decreasing frequency of sightings made as the distance from port increases. This relationship is clearer for Brier sightings than for Quoddy sightings. Furthermore, the functional form of the effect does not appear to be an exponential decay. Based on the histograms, we choose to model the functional form as a half-normal function once again. Note that more complicated functional forms (e.g. weibull or hazard functions) could be fit.
On the log scale, this covariate can be estimated as a linear effect on the squared distance from port. Thus, we create SpatialPixelsDataFrame objects which store the squared distances from port.
Dist_Quoddy_sq <- pixels( mesh_land,mask=gSimplify(gBuffer(Domain,width=19),tol=3),
nx=400,ny=400)
Dist_Brier_sq <- Dist_Quoddy_sq
Dist_Quoddy_sq$Dist_Quoddy_sq <- Dist_Quoddy$Dist_Quoddy^2
Dist_Brier_sq$Dist_Brier_sq <- Dist_Brier$Dist_Brier^2
With the covariates all created, we can now define our components and formulae. Unlike before, this time, we need to define a single components object that contains all the required components for both the whale watch and survey sightings.
For our first model, we will ignore all available information provided about the daily activities of the whale watch vessels. Instead, we will ‘let the data speak for themselves’ and inform the most suitable effort surfaces \(\lambda_{eff}\). We will use the data to inform both the shapes and relative magnitudes of \(\lambda_{eff}\) from each port. Let’s define the neccessary components and formulae now.
cmp_WW_Combined1 <- ~
mySpatial(main = coordinates, model = matern_land) +
mySpatial_Brier(main = coordinates, copy='mySpatial', fixed=T) +
mySpatial_Quoddy(main = coordinates, copy='mySpatial', fixed=T) +
Dist_Quoddy_sq(main=Dist_Quoddy_sq, model='linear') + # half normal
Dist_Brier_sq(main=Dist_Brier_sq, model='linear') + # half normal
Diff_WW_Q_B(1) + # contrast Brier vs Quoddy
Intercept_WW_Q(1) + # Intercept for Quoddy
Intercept(1) +
lsig(1) +
log_Depth(main = log_Depth, model='linear')
formula_WW_B1 = coordinates ~ mySpatial_Brier +
Dist_Brier_sq +
Intercept_WW_Q +
Diff_WW_Q_B +
log_Depth
formula_WW_Q1 = coordinates ~ mySpatial_Quoddy +
Dist_Quoddy_sq +
Intercept_WW_Q +
log_Depth
There are a lot of new additions to the components. First, we see the addition of mySpatial_Brier and mySpatial_Quoddy. These define copies of the original spatial field mySpatial. To tell inlabru that we want to copy the original model, we use the argument copy = mySpatial. The argument fixed = T tells inlabru that we do not want to scale the field by an estimated scalar parameter. Next, are the additions of Dist_Quoddy_sq and Dist_Brier_sq. These define linear covariates on the respective SpatialPixelsDataFrame objects (i.e. they define the half-normal functions) and hence describe the shape and range of the effort surfaces. Next, we include the scalar parameters Intercept_WW_Q and Diff_WW_Q_B. These will define the intercept for the Quoddy whale watch port and the relative log intensity of the Brier whale watch port compared with Quoddy. These will help to define the relative total efforts from the two ports. Note that we also include log_Depth - it will become clear why later.
For the formula objects, we include the port-specific spatial field and the port-specific intercept(s). Note that we do not need to update the formula for the survey data.
Now, we use the like/bru approach to combine all three likelihood objects (from the survey and two ports)! We must define the three likelihood objects using the joint set of components defined above.
# from the survey model earlier
fit2_like = like(data = Sightings_survey,
samplers = Effort_survey,
domain = list(
coordinates = mesh_land,
distance = INLA::inla.mesh.1d(seq(0, 2, length.out = 30))),
formula = formula_2,
family = 'cp')
fit_WW_B1 = like(data = Sightings_Brier_nodup,
domain = list(
coordinates = mesh_land),
formula = formula_WW_B1,
family = 'cp')
fit_WW_Q1 = like(data = Sightings_Quoddy_nodup,
domain = list(
coordinates = mesh_land),
formula = formula_WW_Q1,
family = 'cp')
# Slow to fit, so we load a pre-compiled file instead
#fit_combined1 <- bru(fit2_like, fit_WW_B1, fit_WW_Q1, components = cmp_WW_Combined1)
fit_combined1 <- readRDS('./Model_Files/fit_combined1.rds')
fit_combined1$dic$dic #DIC is 2259 - new benchmark can't compare with previous models
## [1] 2259.021
summary(fit_combined1)
## inlabru version: 2.2.4.9000
## INLA version: 20.12.10
## Components:
## mySpatial: Model types main='spde', group='exchangeable', replicate='iid'
## mySpatial_Brier: Copy of 'mySpatial' (types main='spde', group='exchangeable', replicate='iid)
## mySpatial_Quoddy: Copy of 'mySpatial' (types main='spde', group='exchangeable', replicate='iid)
## Dist_Quoddy_sq: Model types main='linear', group='exchangeable', replicate='iid'
## Dist_Brier_sq: Model types main='linear', group='exchangeable', replicate='iid'
## Diff_WW_Q_B: Model types main='linear', group='exchangeable', replicate='iid'
## Intercept_WW_Q: Model types main='linear', group='exchangeable', replicate='iid'
## Intercept: Model types main='linear', group='exchangeable', replicate='iid'
## lsig: Model types main='linear', group='exchangeable', replicate='iid'
## log_Depth: Model types main='linear', group='exchangeable', replicate='iid'
## Likelihoods:
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates + distance ~ mySpatial + Intercept + log_Depth +
## log_hn(distance, lsig) + log(2)
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates ~ mySpatial_Brier + Dist_Brier_sq + Intercept_WW_Q +
## Diff_WW_Q_B + log_Depth
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates ~ mySpatial_Quoddy + Dist_Quoddy_sq + Intercept_WW_Q +
## log_Depth
## Time used:
## Pre = 13.4, Running = 19.3, Post = 1.32, Total = 34
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## Dist_Quoddy_sq -167.182 23.340 -213.845 -166.892 -122.171 -166.315 0
## Dist_Brier_sq -157.240 24.627 -207.539 -156.546 -110.839 -155.168 0
## Diff_WW_Q_B -1.031 0.309 -1.645 -1.029 -0.430 -1.024 0
## Intercept_WW_Q -8.075 0.951 -9.953 -8.072 -6.220 -8.064 0
## Intercept -10.896 0.885 -12.653 -10.889 -9.180 -10.875 0
## lsig 0.062 0.152 -0.251 0.067 0.346 0.077 0
## log_Depth 0.646 0.145 0.363 0.646 0.933 0.645 0
##
## Random effects:
## Name Model
## mySpatial SPDE2 model
## mySpatial_Brier Copy
## mySpatial_Quoddy Copy
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Range for mySpatial 132.22 30.907 85.11 127.47 205.75 117.93
## Stdev for mySpatial 1.52 0.276 1.08 1.49 2.16 1.41
##
## Expected number of effective parameters(stdev): 69.23(0.00)
## Number of equivalent replicates : 192.37
##
## Deviance Information Criterion (DIC) ...............: 2259.02
## Deviance Information Criterion (DIC, saturated) ....: NaN
## Effective number of parameters .....................: -0.358
##
## Watanabe-Akaike information criterion (WAIC) ...: 2.31e+51
## Effective number of parameters .................: 1.16e+51
##
## Marginal log-Likelihood: -1265.48
## Posterior marginals for the linear predictor and
## the fitted values are computed
Notice the lack of samplers used in defining the likelihoods for the whale watch data. This is because we are assuming that the whole domain can be sampled, but with the actual sampling intensity governed by the half-norm function of distance from port. Alternatively, we could specify samplers = Domain. This would automatically exclude any values that lie on land. We choose not to do this as we have manually fixed a large value of distance here, achieving the same effect.
The output from summary shows some interesting findings! First, we detect a large negative effect of the scaled squared distance from port. This is expected, indicating a decreased effort intensity as the distance from the port increases.
Finally, and most interestingly, we detect a large significant effect of log_Depth on the whale intensity. This is in agreement with the earlier exploratory analysis. Thus, it appears that we may have gained a substantial amount of precision by incorporating the whale watch data into our model! This should help to improve the spatial predictions of the whale intensity too.
Next, we will investigate the estimated effort surface \(\lambda_{eff}\) from each port.
Dist_Brier_sq_plot <- pixels(mesh=mesh_land,mask=Domain)
Dist_Brier_sq_plot <- predict(fit_combined1,Dist_Brier_sq_plot,~exp(Dist_Brier_sq + Intercept_WW_Q + Diff_WW_Q_B),n.samples = 20, seed=seed)
Dist_Quoddy_sq_plot <- pixels(mesh=mesh_land,mask=Domain)
Dist_Quoddy_sq_plot <- predict(fit_combined1,Dist_Quoddy_sq_plot,~exp(Dist_Quoddy_sq + Intercept_WW_Q),n.samples = 20, seed=seed)
multiplot(
ggplot() + gg(Dist_Brier_sq_plot[1]) + gg(Domain) + colsc(c(Dist_Brier_sq_plot$mean,Dist_Quoddy_sq_plot$mean)) +
ggtitle('Effort surface for Brier'),
ggplot() + gg(Dist_Quoddy_sq_plot[1]) + gg(Domain) + colsc(c(Dist_Brier_sq_plot$mean,Dist_Quoddy_sq_plot$mean)) +
ggtitle('Effort surface for Quoddy')
)
# What is the estimated relative total effort from the two companies Brier/Quoddy?
Rel_Effort <-
predict(fit_combined1, predpts,
~ sum(weight * exp(Dist_Brier_sq + Intercept_WW_Q + Diff_WW_Q_B)) /
sum(weight * exp(Dist_Quoddy_sq + Intercept_WW_Q)),
n.samples = 20, seed=seed)
Rel_Effort
## mean sd q0.025 median q0.975 smin smax cv
## 1 1.537155 0.2831711 1.155905 1.519102 2.03165 1.104719 2.198757 0.1842177
## var
## 1 0.08018587
We see that the vessels from Brier are estimated to travel further from port compared with those from Quoddy which are estimated to stay very close to port. Furthermore, we estimate that in total, the effort from the Brier port is around ~1.5 times greater than that from the Quoddy port. This finding will be called into question later.
Now, we plot the estimated intensity from the joint model and estimate the whale population size.
plot_pixels_WW1 <- pixels(mesh_land, mask = Domain)
pred_int_WW1 <- predict(
fit_combined1,
plot_pixels_WW1,
~ exp(mySpatial + Intercept + log_Depth),
n.samples = 20,
seed = seed
)
multiplot(
ggplot() + gg(pred_int2[1]) + gg(Domain) +
gg.spatiallines_mod(Effort_survey, colour ='red') +
colsc(c(pred_int2@data$mean, pred_int_WW1@data$mean)) +
ggtitle('Covariate Model Mean'),
ggplot() + gg(pred_int2[2]) + gg(Domain) +
gg.spatiallines_mod(Effort_survey, colour = 'red') +
colsc(c(pred_int2@data$sd, pred_int_WW1@data$sd)) +
ggtitle('Covariate Model SD'),
ggplot() + gg(pred_int_WW1[1]) + gg(Domain) +
gg.spatiallines_mod(Effort_survey, colour ='red') +
colsc(c(pred_int2@data$mean, pred_int_WW1@data$mean)) +
ggtitle('Joint Model Mean'),
ggplot() + gg(pred_int_WW1[2]) + gg(Domain) +
gg.spatiallines_mod(Effort_survey, colour = 'red') +
colsc(c(pred_int2@data$sd, pred_int_WW1@data$sd)) +
ggtitle('Joint Model SD'),
layout = matrix(
c(1:4),
nrow = 2,
ncol = 2,
byrow = F
)
)
# What is the estimated population size?
Lambda_WW1 <- predict(fit_combined1, predpts,
~ sum(weight * exp(mySpatial + Intercept + log_Depth)),
n.samples = 20, seed=seed)
Lambda_WW1
## mean sd q0.025 median q0.975 smin smax cv
## 1 1376.563 615.4314 629.8632 1340.576 2735.95 563.1256 2743.161 0.4470782
## var
## 1 378755.8
Lambda_df <- rbind(Lambda_WW1,Lambda,Lambda2)
Lambda_df$Model <- c('Joint Model 1','Survey-only no depth','Survey-only with depth')
ggplot(Lambda_df,aes(x=Model, y=mean, ymax=q0.975, ymin=q0.025)) +
geom_point() + geom_errorbar()
Extrapolation error may be severe here. The estimated intensity becomes very large as we head South-East from the tracklines. This is because the model-estimated log_Depth effect has been extrapolated to values of log_Depth beyond those seen near the survey tracklines or ports. Since we do not trust this extrapolation of the log_Depth effect outside the observed range of log_Depth values, it is crucial that we restrict our plot to those pixels within 30 km of nearest trackline.
#compute the indices of the pixels that lie < 30km of nearest trackline
restricted_ind <-
which(apply(gWithinDistance(
as(pred_int2, 'SpatialPoints'),
Effort_survey,
dist = 30,
byid = T
), 2, sum) > 0)
pred_int2_restricted <- pred_int2[restricted_ind,]
pred_int_WW1_restricted <- pred_int_WW1[restricted_ind,]
multiplot(ggplot() + gg(pred_int2_restricted[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int2_restricted@data$mean,pred_int_WW1_restricted@data$mean),probs = c(0.000,1))) + ggtitle('Covariate Model Mean'),
ggplot() + gg(pred_int2_restricted[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int2_restricted@data$sd,pred_int_WW1_restricted@data$sd),probs = c(0.001,1))) + ggtitle('Covariate Model SD'),
ggplot() + gg(pred_int_WW1_restricted[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int2_restricted@data$mean,pred_int_WW1_restricted@data$mean),probs = c(0.001,1))) + ggtitle('Joint Model Mean'),
ggplot() + gg(pred_int_WW1_restricted[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int2_restricted@data$sd,pred_int_WW1_restricted@data$sd),probs = c(0.001,1))) + ggtitle('Joint Model SD'),
layout = matrix(c(1:4),nrow=2,ncol=2,byrow = F))
# and in the restricted region?
Lambda_WW1_restricted <- predict(fit_combined1, predpts_restricted, ~ sum(weight * exp(mySpatial + Intercept +
log_Depth)), n.samples = 20, seed=seed)
Lambda_WW1_restricted
## mean sd q0.025 median q0.975 smin smax cv
## 1 579.256 137.1079 417.4011 537.829 848.8321 385.4992 926.9183 0.2366966
## var
## 1 18798.59
Lambda_df_restricted <- rbind(Lambda_WW1_restricted,Lambda_restricted,Lambda2_restricted)
Lambda_df_restricted$Model <- c('Joint Model 1','Survey-only no depth','Survey-only with depth')
ggplot(Lambda_df_restricted,aes(x=Model, y=mean, ymax=q0.975, ymin=q0.025)) +
geom_point() + geom_errorbar() + ggtitle('Estimated Population Size in Restricted Region')
We see that the estimated population sizes within the restricted domain are much similar compared with those across the whole domain. However, we still see that the joint model predicts that a larger population of whales is present. Additionally, the posterior distributions of the total population size are more variable under the joint model compared with the survey-only model.
Qualitiatively, the intensity surfaces from the joint and survey-only models appear similar. ‘Hotspots’ are mostly consistent across the two models, albeit with some slight evidence of a higher whale intensity being present around the whale watch ports. However, under the joint model, the ‘hotspots’ of whale activity detected by the survey-only model become much more pronounced. This is likely due to a large log_Depth effect being detected (try plotting log_Depth to see for yourself!).
For our second model, we will use the available information on the fleet sizes of the two whale watch companies and the number and duration of trips that leave each port. Looking at the two websites, we find the following information:
Note that the previous model estimated Brier’s effort surface to be only ~1.5 times bigger! This is unrealistic… Instead, in model 2, we will fix the total effort from Brier equal to 3.22 times that of Quoddy. To achieve this, we will add a log(3.22) offset to the Brier linear predictor.
Note that we also need to define an integration function to integrate the distance function across the domain, with the current guess of the parameters Dist_Brier_sq and Dist_Quoddy_sq. We will use these to normalise the two effort surfaces \(\lambda_{eff}\) and put them on the same scale. Let’s define these now.
int_dist_Brier <- function(par){
return(-log(sum(exp(Dist_Brier_sq@data[,1]*par))/2000))
}
int_dist_Quoddy <- function(par){
return(-log(sum(exp(Dist_Quoddy_sq@data[,1]*par))/2000))
}
Note that we divide the total sum by 2000 to rescale the values to a more sensible range to avoid numerical instabilities.
With these two normalising funcions defined, we can now define the components and formulae. There are many ways of defining the model. The one we show has been chosen for its ease of reading. Unlike before, instead of defining model='linear' for the two Dist_X_sq variables, we instead define model=‘offset’ and separately define scalar parameters Quoddy_par and Brier_par. This allows us to more easily define the normalising terms int_dist_X in the linear predictors. Note that model=offset here simply extracts the desired value of the covariates, without scaling it by a parameter.
cmp_WW_Combined2 <-
~ mySpatial_Brier(main = coordinates, copy='mySpatial', fixed=T) +
mySpatial_Quoddy(main = coordinates, copy='mySpatial', fixed=T) +
Dist_Quoddy_sq(main=Dist_Quoddy_sq, model='offset') +
Dist_Brier_sq(main=Dist_Brier_sq, model='offset') +
Intercept_WW_Q(1) +
mySpatial(main = coordinates, model = matern_land) +
Intercept(1) +
lsig(1) +
Quoddy_par(1) +
Brier_par(1) +
log_Depth(main = log_Depth, model='linear')
formula_WW_B2 = coordinates ~ mySpatial_Brier +
Dist_Brier_sq*Brier_par +
Intercept_WW_Q + log(3.22) +
log_Depth +
int_dist_Brier(Brier_par)
formula_WW_Q2 = coordinates ~ mySpatial_Quoddy +
Dist_Quoddy_sq*Quoddy_par +
Intercept_WW_Q +
log_Depth +
int_dist_Quoddy(Quoddy_par)
Note that Diff_WW_Q_B has been replaced with log(3.22) based on the reported trip numbers and durations. Note also, that Dist_Brier_sq and Dist_Quoddy_sq are multiplied by Brier_par and Quoddy_par respectively in the formula objects, due to them being offsets.
Now we can define the likelihoods and fit the models
fit_WW_B2 = like(data = Sightings_Brier_nodup,
domain = list(
coordinates = mesh_land),
formula = formula_WW_B2,
family = 'cp',
allow_latent = T)
fit_WW_Q2 = like(data = Sightings_Quoddy_nodup,
domain = list(
coordinates = mesh_land),
formula = formula_WW_Q2,
family = 'cp',
allow_latent = T)
# Slow
# fit_combined2 <-
# bru(fit2_like, fit_WW_B2, fit_WW_Q2, components = cmp_WW_Combined2,
# options = list(bru_initial = list(Brier_par = -100,
# Quoddy_par = -100)))
fit_combined2 <- readRDS('./Model_Files/fit_combined2.rds')
fit_combined2$dic$dic #DIC is 2326.411 - substantially worse fit.
## [1] 2326.411
summary(fit_combined2)
## inlabru version: 2.2.4.9000
## INLA version: 20.12.10
## Components:
## mySpatial_Brier: Copy of 'mySpatial' (types main='spde', group='exchangeable', replicate='iid)
## mySpatial_Quoddy: Copy of 'mySpatial' (types main='spde', group='exchangeable', replicate='iid)
## Dist_Quoddy_sq: Model types main='offset', group='exchangeable', replicate='iid'
## Dist_Brier_sq: Model types main='offset', group='exchangeable', replicate='iid'
## Intercept_WW_Q: Model types main='linear', group='exchangeable', replicate='iid'
## mySpatial: Model types main='spde', group='exchangeable', replicate='iid'
## Intercept: Model types main='linear', group='exchangeable', replicate='iid'
## lsig: Model types main='linear', group='exchangeable', replicate='iid'
## Quoddy_par: Model types main='linear', group='exchangeable', replicate='iid'
## Brier_par: Model types main='linear', group='exchangeable', replicate='iid'
## log_Depth: Model types main='linear', group='exchangeable', replicate='iid'
## Likelihoods:
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates + distance ~ mySpatial + Intercept + log_Depth +
## log_hn(distance, lsig) + log(2)
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates ~ mySpatial_Brier + Dist_Brier_sq * Brier_par + Intercept_WW_Q +
## log(3.22) + log_Depth + int_dist_Brier(Brier_par)
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates ~ mySpatial_Quoddy + Dist_Quoddy_sq * Quoddy_par +
## Intercept_WW_Q + log_Depth + int_dist_Quoddy(Quoddy_par)
## Time used:
## Pre = 10.1, Running = 11.5, Post = 0.927, Total = 22.5
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## Intercept_WW_Q -9.381 0.949 -11.254 -9.378 -7.527 -9.371 0
## Intercept -10.890 0.894 -12.665 -10.883 -9.156 -10.869 0
## lsig 0.062 0.152 -0.250 0.067 0.346 0.077 0
## Quoddy_par -160.362 25.511 -211.255 -160.083 -111.058 -159.530 0
## Brier_par -188.373 24.831 -238.356 -187.943 -140.822 -187.087 0
## log_Depth 0.639 0.145 0.357 0.639 0.926 0.637 0
##
## Random effects:
## Name Model
## mySpatial SPDE2 model
## mySpatial_Brier Copy
## mySpatial_Quoddy Copy
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Range for mySpatial 136.72 32.977 85.82 131.89 214.37 122.44
## Stdev for mySpatial 1.51 0.242 1.11 1.49 2.06 1.43
##
## Expected number of effective parameters(stdev): 67.03(0.00)
## Number of equivalent replicates : 198.67
##
## Deviance Information Criterion (DIC) ...............: 2326.41
## Deviance Information Criterion (DIC, saturated) ....: NaN
## Effective number of parameters .....................: 18.71
##
## Watanabe-Akaike information criterion (WAIC) ...: 3.73e+79
## Effective number of parameters .................: 1.87e+79
##
## Marginal log-Likelihood: -1279.61
## Posterior marginals for the linear predictor and
## the fitted values are computed
From the summary output, we see that the DIC for this model is larger than the previous unconstrained model. However, we saw before that the unconstrained model led to unrealistic predictions of the relative total amounts of effort from the two ports. Let’s investigate the predicted intensity and estimated population size.
# Plot the effort surfaces from the two WW companies/ports
Dist_Brier_sq_plot2 <- pixels(mesh=mesh_land,mask=Domain)
Dist_Brier_sq_plot2 <- predict(fit_combined2,Dist_Brier_sq_plot2,~exp(Intercept_WW_Q + log(3.22) + Dist_Brier_sq*Brier_par),n.samples = 20, seed=seed)
ggplot() + gg(Dist_Brier_sq_plot2[1]) + gg(Domain) + colsc(Dist_Brier_sq_plot2[1]$mean) +
ggtitle('Effort surface for Brier')
Dist_Quoddy_sq_plot2 <- pixels(mesh=mesh_land,mask=Domain)
Dist_Quoddy_sq_plot2 <- predict(fit_combined2,Dist_Quoddy_sq_plot2,~exp(Intercept_WW_Q + Dist_Quoddy_sq*Quoddy_par),n.samples = 20, seed=seed)
ggplot() + gg(Dist_Quoddy_sq_plot2[1]) + gg(Domain) + colsc(Dist_Quoddy_sq_plot2[1]$mean) +
ggtitle('Effort surface for Quoddy')
# Plot the (effort-corrected) whale intensity
plot_pixels_WW2 <- pixels(mesh_land, mask = Domain)
pred_int_WW2 <- predict(fit_combined2, plot_pixels_WW2,
~ exp(mySpatial + Intercept + log_Depth), n.samples = 20, seed=seed)
# threshold the colour palette at the 99.5th percentile for nicer viewing
multiplot(ggplot() + gg(pred_int_WW1[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int_WW1@data$mean,pred_int_WW2@data$mean),probs = c(0.005,0.995))) + ggtitle('Joint Model 1 Mean'),
ggplot() + gg(pred_int_WW1[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int_WW1@data$sd,pred_int_WW2@data$sd),probs = c(0.005,0.995))) + ggtitle('Joint Model 1 SD'),
ggplot() + gg(pred_int_WW2[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int_WW1@data$mean,pred_int_WW2@data$mean),probs = c(0.005,0.995))) + ggtitle('Joint Model 2 Mean'),
ggplot() + gg(pred_int_WW2[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int_WW1@data$sd,pred_int_WW2@data$sd),probs = c(0.005,0.995))) + ggtitle('Joint Model 2 SD'),
layout = matrix(c(1:4),nrow=2,ncol=2,byrow = F))
# We may be extrapolating too far here. Let's restrict our plot to those pixels within 30 km of nearest trackline
pred_int_WW2_restricted <- pred_int_WW2[restricted_ind,]
multiplot(ggplot() + gg(pred_int_WW1_restricted[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(c(pred_int_WW1_restricted@data$mean,pred_int_WW2_restricted@data$mean)) + ggtitle('Joint Model 1 Mean'),
ggplot() + gg(pred_int_WW1_restricted[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(c(pred_int_WW1_restricted@data$sd,pred_int_WW2_restricted@data$sd)) + ggtitle('Joint Model 1 SD'),
ggplot() + gg(pred_int_WW2_restricted[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(c(pred_int_WW1_restricted@data$mean,pred_int_WW2_restricted@data$mean)) + ggtitle('Joint Model 2 Mean'),
ggplot() + gg(pred_int_WW2_restricted[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(c(pred_int_WW1_restricted@data$sd,pred_int_WW2_restricted@data$sd)) + ggtitle('Joint Model 2 SD'),
layout = matrix(c(1:4),nrow=2,ncol=2,byrow = F))
# What is the estimated popualation size?
Lambda_WW2 <- predict(fit_combined2, predpts, ~ sum(weight * exp(mySpatial + Intercept +
log_Depth)), n.samples = 20, seed=seed)
Lambda_WW2
## mean sd q0.025 median q0.975 smin smax cv
## 1 1769.488 1931.071 600.2515 1224.456 6486.512 582.9246 9365.736 1.091316
## var
## 1 3729035
Lambda_df <- rbind(Lambda_WW1,Lambda_WW2,Lambda,Lambda2)
Lambda_df$Model <- c('Joint Model 1','Joint Model 2','Survey-only no depth','Survey-only with depth')
ggplot(Lambda_df,aes(x=Model, y=mean, ymax=q0.975, ymin=q0.025)) +
geom_point() + geom_errorbar()
# and in the restricted region?
Lambda_WW2_restricted <-
predict(fit_combined2, predpts_restricted,
~ sum(weight * exp(mySpatial + Intercept + log_Depth)),
n.samples = 20,seed=seed)
Lambda_WW2_restricted
## mean sd q0.025 median q0.975 smin smax cv
## 1 532.4122 112.998 356.8408 520.48 792.1985 314.3052 830.9663 0.2122379
## var
## 1 12768.56
Lambda_df_restricted <- rbind(Lambda_WW1_restricted,Lambda_WW2_restricted,Lambda_restricted,Lambda2_restricted)
Lambda_df_restricted$Model <- c('Joint Model 1','Joint Model 2','Survey-only no depth','Survey-only with depth')
ggplot(Lambda_df_restricted,aes(x=Model, y=mean, ymax=q0.975, ymin=q0.025)) +
geom_point() + geom_errorbar() + ggtitle('Estimated Population Size in Restricted Region')
# Sanity Check: is 3.22 is contained in the approximate 95% credible intervals?
# The order of division is reversed due to the negative sign in the function
Rel_Effort2_pixels <- pixels(mesh=mesh_land,mask=Domain)
Rel_Effort2 <-
predict(fit_combined2, Rel_Effort2_pixels,
~ exp(int_dist_Quoddy(Quoddy_par))/
exp(int_dist_Brier(Brier_par)),
n.samples = 20, seed=seed)
Rel_Effort2 # Good
## mean sd q0.025 median q0.975 smin smax cv
## 1 2.728676 0.5585617 2.019306 2.525411 3.812055 1.843245 3.934326 0.2047006
## var
## 1 0.3119912
From the output of model 2 we find a very similar estimate for the effect of log_Depth. We see that the range of the Quoddy vessels is estimated to exceed those from Brier, which is the opposite finding from the previous model (although credible intervals do overlap).
Maps of the fitted intensity show a qualitatively similar appearance, albeit with evidence of numerical instability in the random field in the Southern corners of the domain. This instability is reflected in the very large upper credible intervals for the total population size throughout the domain. Future work should correct for this numerical instability if predictions are desired throughout the entire domain.
To avoid this numerical instability, and to further reduce the risk of extrapolation error, we once again restricted our predictions to the smaller domain. We find large agreement between all 4 models, with the joint models both predicting larger population sizes. Note that the credible intervals for the population size all overlap.
We now move on to our third and final model. In the first model, we completely ignored all available information about the relative amounts of effort from each of the two whale watch ports. In the second model, we calculated an estimate for the relative total effort and constrained the model with this estimate as a fixed offset term.
In reality, we do not know for certain that the total effort from Brier exceeded that of Quoddy by a factor of 3.22. The reported trip times are merely estimates; the reported number of daily departures fail to account for cancellation days due to bad weather or poor turnout. There may exist biases between the two companies with regards to exceeding or failing to meet the advertised trip duration. In summary, there are numerous reasons why our estimate effort ratio of 3.22 may be innacurate.
To capture this uncertainty, we can use Bayesian machinery and define a prior distribution on the the relative total effort. We put a normal prior on the log scale factor, centered at log(3.22), with a prior standard deviation of 0.1. Thus, a-priori, we put a prior probability of 0.95 that that the ratio of effort between Brier vs Quoddy is between 2.6 and 3.9.
We now define the shared components and the formulae. Then, we define the likelihood objects and fit the joint model using bru.
cmp_WW_Combined3 <- ~ mySpatial_Brier(main = coordinates, copy='mySpatial', fixed=T) +
mySpatial_Quoddy(main = coordinates, copy='mySpatial', fixed=T) +
Dist_Quoddy_sq(main=Dist_Quoddy_sq, model='offset') +
Dist_Brier_sq(main=Dist_Brier_sq, model='offset') +
Intercept_WW_Q(1) +
Diff_QQ_B_Q(main=1,model='linear',mean.linear=1.17, prec.linear=100) +
mySpatial(main = coordinates, model = matern_land) +
Intercept(1) +
lsig(1) +
Quoddy_par(1) +
Brier_par(1) +
log_Depth(main = log_Depth, model='linear')
formula_WW_B3 = coordinates ~ mySpatial_Brier +
Dist_Brier_sq*Brier_par +
Intercept_WW_Q +
Diff_QQ_B_Q +
log_Depth +
int_dist_Brier(Brier_par)
formula_WW_Q3 = coordinates ~ mySpatial_Quoddy +
Dist_Quoddy_sq*Quoddy_par +
Intercept_WW_Q +
log_Depth +
int_dist_Quoddy(Quoddy_par)
fit_WW_B3 = like(data = Sightings_Brier_nodup,
domain = list(
coordinates = mesh_land),
formula = formula_WW_B3,
family = 'cp')
fit_WW_Q3 = like(data = Sightings_Quoddy_nodup,
domain = list(
coordinates = mesh_land),
formula = formula_WW_Q3,
family = 'cp')
# fit_combined3 <- bru(fit2_like, fit_WW_B3, fit_WW_Q3, components = cmp_WW_Combined3)
fit_combined3 <- readRDS('./Model_Files/fit_combined3.rds')
fit_combined3$dic$dic #DIC is 2308 - performance in-between the other 2 models
## [1] 2307.887
summary(fit_combined3)
## inlabru version: 2.2.4.9000
## INLA version: 20.12.10
## Components:
## mySpatial_Brier: Copy of 'mySpatial' (types main='spde', group='exchangeable', replicate='iid)
## mySpatial_Quoddy: Copy of 'mySpatial' (types main='spde', group='exchangeable', replicate='iid)
## Dist_Quoddy_sq: Model types main='offset', group='exchangeable', replicate='iid'
## Dist_Brier_sq: Model types main='offset', group='exchangeable', replicate='iid'
## Intercept_WW_Q: Model types main='linear', group='exchangeable', replicate='iid'
## Diff_QQ_B_Q: Model types main='linear', group='exchangeable', replicate='iid'
## mySpatial: Model types main='spde', group='exchangeable', replicate='iid'
## Intercept: Model types main='linear', group='exchangeable', replicate='iid'
## lsig: Model types main='linear', group='exchangeable', replicate='iid'
## Quoddy_par: Model types main='linear', group='exchangeable', replicate='iid'
## Brier_par: Model types main='linear', group='exchangeable', replicate='iid'
## log_Depth: Model types main='linear', group='exchangeable', replicate='iid'
## Likelihoods:
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates + distance ~ mySpatial + Intercept + log_Depth +
## log_hn(distance, lsig) + log(2)
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates ~ mySpatial_Brier + Dist_Brier_sq * Brier_par + Intercept_WW_Q +
## Diff_QQ_B_Q + log_Depth + int_dist_Brier(Brier_par)
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates ~ mySpatial_Quoddy + Dist_Quoddy_sq * Quoddy_par +
## Intercept_WW_Q + log_Depth + int_dist_Quoddy(Quoddy_par)
## Time used:
## Pre = 8.21, Running = 19.7, Post = 1.36, Total = 29.3
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## Intercept_WW_Q -9.284 1.037 -11.322 -9.282 -7.254 -9.280 0
## Diff_QQ_B_Q 0.928 0.090 0.753 0.928 1.104 0.928 0
## Intercept -11.053 0.962 -12.961 -11.046 -9.182 -11.033 0
## lsig 0.062 0.152 -0.250 0.068 0.346 0.078 0
## Quoddy_par -160.122 25.010 -210.087 -159.825 -111.854 -159.233 0
## Brier_par -175.444 24.788 -225.675 -174.890 -128.320 -173.792 0
## log_Depth 0.651 0.146 0.367 0.650 0.939 0.648 0
##
## Random effects:
## Name Model
## mySpatial SPDE2 model
## mySpatial_Brier Copy
## mySpatial_Quoddy Copy
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Range for mySpatial 154.62 8.935 138.13 154.21 173.25 153.19
## Stdev for mySpatial 1.63 0.246 1.21 1.60 2.17 1.56
##
## Expected number of effective parameters(stdev): 65.35(0.00)
## Number of equivalent replicates : 203.77
##
## Deviance Information Criterion (DIC) ...............: 2307.89
## Deviance Information Criterion (DIC, saturated) ....: NaN
## Effective number of parameters .....................: 13.48
##
## Watanabe-Akaike information criterion (WAIC) ...: 3.15e+74
## Effective number of parameters .................: 1.57e+74
##
## Marginal log-Likelihood: -1281.92
## Posterior marginals for the linear predictor and
## the fitted values are computed
The components are almost identical to the second model, with the exception of the re-inclusion of the Diff_QQ_B_Q term in both the components and the formulae. Unlike with model 1 however, we define the prior distribution by including the lines ,model='linear',mean.linear=1.17, prec.linear=100. By default, a Gaussian prior is placed on this parameter and we specified both the prior precision (the inverse of the variance) with ‘prec.linear’ and the prior mean with ‘mean.linear’. Note that 1.17 = log(3.22).
We find that the DIC of this model lies in-between the first (unconstrained) and second (hard-constrained) models’ DIC values, as expected. The estimated effect of log_Depth is very similar to the previous two models.
Next, we plot the estimated intensity surface and estimated population size.
Dist_Brier_sq_plot3 <- pixels(mesh=mesh_land,mask=Domain)
Dist_Brier_sq_plot3 <- predict(fit_combined3,Dist_Brier_sq_plot3,~exp(Dist_Brier_sq*Brier_par + Intercept_WW_Q + Diff_QQ_B_Q),n.samples = 20, seed=seed)
ggplot() + gg(Dist_Brier_sq_plot3[1]) + colsc(Dist_Brier_sq_plot3[1]$mean) +
gg(Sightings_Brier_nodup,colour='green') + gg(Domain) +
ggtitle('Brier Effort')
Dist_Quoddy_sq_plot3 <- pixels(mesh=mesh_land,mask=Domain)
Dist_Quoddy_sq_plot3 <- predict(fit_combined3,Dist_Quoddy_sq_plot3,~exp(Dist_Quoddy_sq*Quoddy_par + Intercept_WW_Q),n.samples = 20, seed=seed)
ggplot() + gg(Dist_Quoddy_sq_plot3[1]) + colsc(Dist_Quoddy_sq_plot3[1]$mean) +
gg(Sightings_Quoddy_nodup,colour='green') + gg(Domain) +
ggtitle('Quoddy Effort')
plot_pixels_WW3 <- pixels(mesh_land, mask = Domain)
pred_int_WW3 <- predict(fit_combined3, plot_pixels_WW3,
~ exp(mySpatial + Intercept + log_Depth), n.samples = 20, seed=seed)
multiplot(ggplot() + gg(pred_int_WW1[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int_WW1@data$mean,pred_int_WW3@data$mean),probs = c(0.005,0.995))) + ggtitle('Joint Model 1 Mean'),
ggplot() + gg(pred_int_WW1[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int_WW1@data$sd,pred_int_WW3@data$sd),probs = c(0.005,0.995))) + ggtitle('Joint Model 1 SD'),
ggplot() + gg(pred_int_WW3[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int_WW1@data$mean,pred_int_WW3@data$mean),probs = c(0.005,0.995))) + ggtitle('Joint Model 3 Mean'),
ggplot() + gg(pred_int_WW3[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(quantile(c(pred_int_WW1@data$sd,pred_int_WW3@data$sd),probs = c(0.005,0.995))) + ggtitle('Joint Model 3 SD'),
layout = matrix(c(1:4),nrow=2,ncol=2,byrow = F))
Again, this third model seems to be predicting unrealistically high whale densities away from the tracklines. Let’s restrict our plot to those pixels within 30 km of nearest trackline
pred_int_WW3_restricted <- pred_int_WW3[restricted_ind,]
multiplot(ggplot() + gg(pred_int_WW1_restricted[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(c(pred_int_WW1_restricted@data$mean,pred_int_WW3_restricted@data$mean)) + ggtitle('Joint Model 1 Mean'),
ggplot() + gg(pred_int_WW1_restricted[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(c(pred_int_WW1_restricted@data$sd,pred_int_WW3_restricted@data$sd)) + ggtitle('Joint Model 1 SD'),
ggplot() + gg(pred_int_WW3_restricted[1]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(c(pred_int_WW1_restricted@data$mean,pred_int_WW3_restricted@data$mean)) + ggtitle('Joint Model 3 Mean'),
ggplot() + gg(pred_int_WW3_restricted[2]) + gg(Domain) + gg.spatiallines_mod(Effort_survey, colour='red') +
colsc(c(pred_int_WW1_restricted@data$sd,pred_int_WW3_restricted@data$sd)) + ggtitle('Joint Model 3 SD'),
layout = matrix(c(1:4),nrow=2,ncol=2,byrow = F))
# What is the estimated popualation size?
Lambda_WW3 <- predict(fit_combined3, predpts, ~ sum(weight * exp(mySpatial + Intercept +
log_Depth)), n.samples = 20, seed=seed)
Lambda_WW3
## mean sd q0.025 median q0.975 smin smax cv
## 1 1185.293 529.1743 563.4462 980.2072 2296.64 536.1072 2505.421 0.4464501
## var
## 1 280025.4
Lambda_df <- rbind(Lambda_WW1,Lambda_WW2,Lambda_WW3,Lambda,Lambda2)
Lambda_df$Model <- c('Joint Model 1','Joint Model 2','Joint Model 3','Survey-only no depth','Survey-only with depth')
ggplot(Lambda_df,aes(x=Model, y=mean, ymax=q0.975, ymin=q0.025)) +
geom_point() + geom_errorbar()
# and in the restricted region?
Lambda_WW3_restricted <- predict(fit_combined3, predpts_restricted, ~ sum(weight * exp(mySpatial + Intercept +
log_Depth)), n.samples = 20, seed=seed)
Lambda_WW3_restricted
## mean sd q0.025 median q0.975 smin smax cv
## 1 540.4578 120.6534 374.4975 533.291 792.0214 357.4866 891.4815 0.2232429
## var
## 1 14557.23
Lambda_df_restricted <- rbind(Lambda_WW1_restricted,Lambda_WW2_restricted,Lambda_WW3_restricted,Lambda_restricted,Lambda2_restricted)
Lambda_df_restricted$Model <- c('Joint Model 1','Joint Model 2','Joint Model 3','Survey-only no depth','Survey-only with depth')
ggplot(Lambda_df_restricted,aes(x=Model, y=mean, ymax=q0.975, ymin=q0.025)) +
geom_point() + geom_errorbar() + ggtitle('Estimated Population Size in Restricted Region')
We find a qualitatively similar map of intensity compared with the first model, without the numerical instability seen in the second model. Estimates of population size across both the restricted and unrestricted domains are very similar to those from the first model, albeit with slightly reduced credible interval widths. Whilst this may change with the addition of more posterior samples, we shorter credible intervals are to be expected since we are ‘adding in information’ into the model in the form of a prior distribution on the relative effort.
What is the estimated relative effort across the two ports? Given that we have normalised the effort fields by their total integrals, we can assess the estimated relative effort by simply assessing the posterior marginal distribution of the exponential of Diff_QQ_B_Q.
# First plot the posterior distribution of the parameter on the log scale
plot(fit_combined3,'Diff_QQ_B_Q')
# To get the posterior of the exponential of the parameter, we can sample from the model.
samples_WW3 <- generate(fit_combined3,n.samples = 20, seed=seed) #already sampled
Diff_QQ_B_Q_samples <- sapply(samples_WW3, FUN = function(x){return(exp(x$Diff_QQ_B_Q))})
Diff_QQ_B_Q_samples <-
data.frame(mean=c(mean(Diff_QQ_B_Q_samples),3.22),
sd=c(sd(Diff_QQ_B_Q_samples),0.1),
LCL=c(quantile(Diff_QQ_B_Q_samples,probs=0.025),exp(log(3.22)-2*0.1)),
UCL=c(quantile(Diff_QQ_B_Q_samples,probs=0.975),exp(log(3.22)+2*0.1)),
name=factor(c('Posterior Relative Effort','Prior Relative Effort')))
ggplot(Diff_QQ_B_Q_samples, aes(x=name, y=mean, ymax=UCL, ymin=LCL)) +
geom_point() + geom_errorbar() + xlab('') + ylab('Relative Effort')
Finally, we plot the estimated effects of depth across the three models. We also include a vertical line showing the 90th percentile of the log_Depth values observed at the survey tracklines.
samples_WW1 <- generate(fit_combined1,n.samples = 20, seed=seed)
samples_WW2 <- generate(fit_combined2,n.samples = 20, seed=seed)
#samples_WW3 <- generate(fit_combined3,n.samples = 20, seed=seed)
# What is the upper 90th percentile of log depth seen by transect line?
max_depth_obs <- quantile(over(Effort_survey,log_Depth)$log_Depth, probs=c(0.90))
depthpred_WW1 <- sapply(samples_WW1,FUN = function(x){return(exp(depthdf$log_Depth*x$log_Depth))})
depthpred_WW2 <- sapply(samples_WW2,FUN = function(x){return(exp(depthdf$log_Depth*x$log_Depth))})
depthpred_WW3 <- sapply(samples_WW1,FUN = function(x){return(exp(depthdf$log_Depth*x$log_Depth))})
multiplot(
ggplot(data.frame(mean=apply(depthpred_WW1,1, mean),
LCL=apply(depthpred_WW1,1, quantile, probs=c(0.025)),
UCL=apply(depthpred_WW1,1, quantile, probs=c(0.975)),
logdepth=seq(from=min(log_Depth@data$log_Depth),
to=max(log_Depth@data$log_Depth),
length.out = 100)),
aes(y=mean,x=logdepth,ymax=UCL,ymin=LCL)) +
geom_line() + geom_ribbon(alpha=0.4) +
geom_vline(xintercept = max_depth_obs) +
ylab('Relative intensity change'),
ggplot(data.frame(mean=apply(depthpred_WW2,1, mean),
LCL=apply(depthpred_WW2,1, quantile, probs=c(0.025)),
UCL=apply(depthpred_WW2,1, quantile, probs=c(0.975)),
logdepth=seq(from=min(log_Depth@data$log_Depth),
to=max(log_Depth@data$log_Depth),
length.out = 100)),
aes(y=mean,x=logdepth,ymax=UCL,ymin=LCL)) +
geom_line() + geom_ribbon(alpha=0.4) +
geom_vline(xintercept = max_depth_obs) +
ylab('Relative intensity change'),
ggplot(data.frame(mean=apply(depthpred_WW3,1, mean),
LCL=apply(depthpred_WW3,1, quantile, probs=c(0.025)),
UCL=apply(depthpred_WW3,1, quantile, probs=c(0.975)),
logdepth=seq(from=min(log_Depth@data$log_Depth),
to=max(log_Depth@data$log_Depth),
length.out = 100)),
aes(y=mean,x=logdepth,ymax=UCL,ymin=LCL)) +
geom_line() + geom_ribbon(alpha=0.4) +
geom_vline(xintercept = max_depth_obs) +
ylab('Relative intensity change'),
layout = matrix(1:3, nrow=1, ncol=3, byrow=T))
As a result of the relatively small number of sightings, we see a huge level of uncertainty associated with the parameter estimate.
We will now use the survey data from 2011 that we held out to compare our models. We will assume that the space use of the whales remained constant (i.e. no changes to their density occurred) from 2007-2011. We will predict the number of sightings expected to be found from the 2011 survey, based on the locations of the survey’s tracklines, from both the best joint models and the survey-only models.
int_test <- ipoints(samplers = Effort_survey_test, domain = mesh_land,name='space')
# the probability of making a sighting >4km away estimated to be negligible
int_test_dist <- ipoints(domain = INLA::inla.mesh.1d(seq(0, 4, length.out = 30)),
name='distance')
# This defines integration points in both 2D space and 1D distance dimensions
int_test_prod <- cprod(int_test,int_test_dist)
ggplot() + gg(Domain) + gg(int_test_prod) + gg.spatiallines_mod(Effort_survey_test, colour='pink')
# Where do these points come from? They are the mesh vertices!
ggplot() + gg(mesh_land) + gg(int_test_prod) + gg.spatiallines_mod(Effort_survey_test, colour='pink')
sum(int_test_prod$weight)
## [1] 6952.489
sum(int_test$weight)
## [1] 1738.122
# Notice that the sum of the product weights are 4 times larger, due to the distance mesh extending 4km out from each line. We still need to scale by 2 again by adding log(2), to account for sightings taking place both sides of the tracklines!
# We need to remove the 3rd dimension that has been added
int_test_prod@coords <- int_test_prod@coords[,c(1:2)]
int_test_prod@bbox <- int_test_prod@bbox[c(1:2),]
Lambda_test_fit_combined <-
predict(fit_combined3, int_test_prod,
~ rpois(1,
lambda = sum( weight * exp(mySpatial + Intercept + log_Depth
+ log_hn(distance,lsig) + log(2)))),
n.samples=200, seed=seed)
Lambda_test_fit_combined
## mean sd q0.025 median q0.975 smin smax cv var
## 1 8.245 3.697748 2 8 16 1 27 0.4484837 13.67334
Lambda_test_fit_single <-
predict(fit2, int_test_prod,
~ rpois(1,
lambda = sum( weight * exp(mySpatial + Intercept + log_Depth
+ log_hn(distance,lsig) + log(2)))),
n.samples=200, seed=seed)
Lambda_test_fit_single
## mean sd q0.025 median q0.975 smin smax cv var
## 1 7.985 3.734233 3 8 15 1 23 0.467656 13.9445
Lambda_test_fit_single_nocov <-
predict(fit, int_test_prod,
~ rpois(1,
lambda = sum( weight * exp(mySpatial + Intercept
+ log_hn(distance,lsig) + log(2)))),
n.samples=200, seed=seed)
Lambda_test_fit_single_nocov
## mean sd q0.025 median q0.975 smin smax cv var
## 1 8.125 3.271527 3 7 15.025 2 21 0.4026495 10.70289
# How many sightings were made in the survey in 2011?
dim(Sightings_survey_test@coords) #15
## [1] 15 2
Lambda_tests <- rbind(Lambda_test_fit_combined,
Lambda_test_fit_single,
Lambda_test_fit_single_nocov)
Lambda_tests$Model <- factor(c('Joint Model 3', 'Survey-only Model with log_Depth','Survey-only Model without covariates'))
ggplot(Lambda_tests, aes(x=Model, y=mean, ymax=q0.975, ymin=q0.025)) +
geom_point() + geom_errorbar() +
geom_hline(yintercept = 15, colour='red',linetype='dashed')
We see that all three models under-estimate the total observed number of sightings, but that all three models contain the observed value within their 95% credible intervals. There may be a slight improvement offered by the joint model with respect to point estimates and uncertainty quantification, but the difference is minor.
An alternative approach for model comparison is to compare the spatial distribution of the points with respect to the fitted intensity surfaces. In other words, how well do our models capture the spatial distribution of 2011’s whale intensity.
Few methods exist for investigating this for log-Gaussian Cox process models. One idea could be map each sighting location to the closest mesh vertex and each sighting distance value to the closest value in the 1D mesh used. Then, we could evaluate the Poisson density function of the 0’s and 1’s evaluate at every combination of mesh vertices and distances. This would provide a score function for us to compare the models. By sampling from the posterior, we can approximate the posterior distribution for these scores.
# Loop through sightings. Find closest mesh locations and distance value.
int_ind <- rep(NA, dim(Sightings_survey_test@coords)[1])
int_test_prod$count <- 0
# loop through the sightings
for(i in 1:dim(Sightings_survey_test@coords)[1])
{
int_ind[i] <- intersect(
# find the closest spatial index in the integration points
as.numeric(apply(gDistance(int_test_prod ,Sightings_survey_test[i,],
byid=T),
1,FUN = function(x){which(x==min(x))})),
# find the closest 1D index for the distances
which(abs(Sightings_survey_test$DISTANCE[i]-int_test_prod$distance) ==
min(abs(Sightings_survey_test$DISTANCE[i]-int_test_prod$distance)))
)
# Update the count by 1
int_test_prod$count[int_ind[i]] <- int_test_prod$count[int_ind[i]] + 1
}
# Now evaluate the posterior mean of the Poisson score function for the models
# Higher values of score indicate a better model fit
Score_joint <-
predict(fit_combined3, int_test_prod,
~ sum(dpois(count,
lambda = weight * exp(mySpatial + Intercept + log_Depth
+ log_hn(distance,lsig) + log(2)),
log = T)),
n.samples=200, seed=seed)
Score_joint
## mean sd q0.025 median q0.975 smin smax cv
## 1 -186.0317 35.45233 -264.041 -178.4565 -136.0709 -324.9121 -121.698 -0.1905715
## var
## 1 1256.868
Score_cov <-
predict(fit2, int_test_prod,
~ sum(dpois(count,
lambda = weight * exp(mySpatial + Intercept + log_Depth
+ log_hn(distance,lsig) + log(2)),
log = T)),
n.samples=200, seed=seed)
Score_cov
## mean sd q0.025 median q0.975 smin smax
## 1 -185.4264 39.67052 -277.1371 -175.4588 -133.1542 -378.0772 -126.5298
## cv var
## 1 -0.2139421 1573.75
Score_nocov <-
predict(fit, int_test_prod,
~ sum(dpois(count,
lambda = weight * exp(mySpatial + Intercept
+ log_hn(distance,lsig) + log(2)),
log = T)),
n.samples=200, seed=seed)
Score_nocov
## mean sd q0.025 median q0.975 smin smax
## 1 -181.8137 31.84523 -242.9247 -177.6743 -135.0516 -352.8489 -124.3663
## cv var
## 1 -0.175153 1014.119
Scores <- rbind(Score_joint,
Score_cov,
Score_nocov)
Scores$Model <- factor(c('Joint Model 3', 'Survey-only Model with log_Depth','Survey-only Model without covariates'))
ggplot(Scores, aes(x=Model, y=mean, ymax=q0.975, ymin=q0.025)) +
geom_point() + geom_errorbar() +
ylab('Score') + ggtitle('Posterior Scores of the 3 Models')
So what do we find out? We see that including the log_Depth in a model for the whale intensity failed to improve either the predictions of the total number of encounters in 2011, or the predicted spatial distribution of these encounters. Furthermore, incorporating the whale watch sightings also failed to improve either measure of predictive performance.
So why haven’t things improved? For starters, the test dataset used for model comparison contained only 15 sightings. Thus only huge differences in model fit would likely be detected by the scoring approach. Secondly, we made strong assumptions that: the whale density did not change across the three months, each survey followed identical protocols with identical detection functions and the whale density in 2011 matched that in 2007-2009. Finally, even if we did improve the accuracy of our maps of whale intensity in the regions in close proximity to the whale watch ports, it is likely that neither of our model comparison measures would detect this. This is because the survey effort in 2011 took place largely outside of these regions.
What have we gained? For starters, by including the whale watch data within a joint model, we gained knowledge about the relationship between whale density and bathymetry. Additionally, we now have a model for the number of sightings to be expected from the whale watch companies given an estimate of their total efforts. This could be used to detect large changes in the whale intensity within the waters in close proximity to their ports.
In the interests of computation time, we only considered a small subset of the total data available on the fin whales for this workshop. It would be interesting to repeat this modelling process on the entire data! The resolution, precision, accuracy of the intensity maps would likely increase greatly, as would the gaps in performance between the joint and survey-only models. In addition to the survey and whale watch data, opportunistic sightings from commercial vessels and other sources are also available. These could also be included in a joint model, with estimates of effort coming from AIS layers! Many of the numerical instabilities we faced with fitting these models would also disappear with more data!
Throughout the modelling stages, we assumed a half-normal detection function. Re-fit the ‘best’ joint model, but replacing the half-normal detection function with the following hazard rate model with shape parameter 5. How does the model compare with the half-normal model? Are there any significant changes to the parameter estimates? Can you spot the evidence that the model has not properly fit (note the prior distribution used!). HINT: The model fitting for this model is very unstable and slow. Once you have built the like object, run fit_combined4 <- readRDS('./Model_Files/fit_combined4.rds') to load a pre-fitted model object.
hr <- function(distance, lsig) {
1 - exp(-(distance / (exp(lsig)))^-5)
}
log_hr <- function(...){log(hr(...))}
cmp_WW_Combined4 <- ~
mySpatial_Brier(main = coordinates, copy='mySpatial', fixed=T) +
mySpatial_Quoddy(main = coordinates, copy='mySpatial', fixed=T) +
Dist_Quoddy_sq(main=Dist_Quoddy_sq, model='offset') +
Dist_Brier_sq(main=Dist_Brier_sq, model='offset') +
Intercept_WW_Q(1) +
Diff_QQ_B_Q(main=1,model='linear',mean.linear=1.17, prec.linear=100) +
mySpatial(main = coordinates, model = matern_land) +
log_Depth(main = log_Depth, model='linear') +
Intercept(1) +
lsig(1) +
Quoddy_par(1) +
Brier_par(1)
formula_WW_B4 = coordinates ~ mySpatial_Brier +
Dist_Brier_sq*Brier_par +
Intercept_WW_Q +
Diff_QQ_B_Q +
int_dist_Brier(Brier_par) +
log_Depth
formula_WW_Q4 = coordinates ~ mySpatial_Quoddy +
Dist_Quoddy_sq*Quoddy_par +
Intercept_WW_Q +
int_dist_Quoddy(Quoddy_par) +
log_Depth
formula_survey_4 = coordinates + distance ~ mySpatial + Intercept +
log_Depth + log_hr(distance, lsig) + log(2)
fit_WW_B4 = like(data = Sightings_Brier_nodup,
domain = list(
coordinates = mesh_land),
formula = formula_WW_B4,
family = 'cp')
fit_WW_Q4 = like(data = Sightings_Quoddy_nodup,
domain = list(
coordinates = mesh_land),
formula = formula_WW_Q4,
family = 'cp')
fit_extra2_like = like(data = Sightings_survey,
samplers = Effort_survey,
domain = list(
coordinates = mesh_land,
distance = INLA::inla.mesh.1d(seq(0, 2, length.out = 30))),
formula = formula_survey_4,
family = 'cp')
#fit_combined4 <- bru(fit_extra2_like, fit_WW_B4, fit_WW_Q4, components = cmp_WW_Combined4,options = list(bru_initial = list(lsig = 0.2)))
fit_combined4 <- readRDS('./Model_Files/fit_combined4.rds')
fit_combined3$dic$dic #DIC is 2308
## [1] 2307.887
fit_combined4$dic$dic #DIC is 2642 - much worse with this detection function
## [1] 2642.317
summary(fit_combined4)
## inlabru version: 2.2.4.9000
## INLA version: 21.01.08-1
## Components:
## mySpatial_Brier: Copy of 'mySpatial' (types main='spde', group='exchangeable', replicate='iid)
## mySpatial_Quoddy: Copy of 'mySpatial' (types main='spde', group='exchangeable', replicate='iid)
## Dist_Quoddy_sq: Model types main='offset', group='exchangeable', replicate='iid'
## Dist_Brier_sq: Model types main='offset', group='exchangeable', replicate='iid'
## Intercept_WW_Q: Model types main='linear', group='exchangeable', replicate='iid'
## Diff_QQ_B_Q: Model types main='linear', group='exchangeable', replicate='iid'
## mySpatial: Model types main='spde', group='exchangeable', replicate='iid'
## log_Depth: Model types main='linear', group='exchangeable', replicate='iid'
## Intercept: Model types main='linear', group='exchangeable', replicate='iid'
## lsig: Model types main='linear', group='exchangeable', replicate='iid'
## Quoddy_par: Model types main='linear', group='exchangeable', replicate='iid'
## Brier_par: Model types main='linear', group='exchangeable', replicate='iid'
## Likelihoods:
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates + distance ~ mySpatial + Intercept + log_hr(distance,
## lsig) + log(2)
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates ~ mySpatial_Brier + Dist_Brier_sq * Brier_par + Intercept_WW_Q +
## Diff_QQ_B_Q + int_dist_Brier(Brier_par) + log_Depth
## Family: 'cp'
## Data class: 'SpatialPointsDataFrame'
## Predictor:
## coordinates ~ mySpatial_Quoddy + Dist_Quoddy_sq * Quoddy_par +
## Intercept_WW_Q + int_dist_Quoddy(Quoddy_par) + log_Depth
## Time used:
## Pre = 12.7, Running = 83.2, Post = 1.37, Total = 97.3
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## Intercept_WW_Q -72.735 8.380 -89.249 -72.714 -56.354 -72.671 0
## Diff_QQ_B_Q 1.169 0.100 0.973 1.169 1.365 1.169 0
## log_Depth -0.320 0.083 -0.480 -0.321 -0.156 -0.323 0
## Intercept -7.605 0.383 -8.360 -7.604 -6.857 -7.601 0
## lsig 0.207 0.073 0.059 0.209 0.346 0.212 0
## Quoddy_par 0.000 0.002 -0.005 0.000 0.005 0.000 0
## Brier_par -207.877 25.674 -258.461 -207.817 -157.676 -207.695 0
##
## Random effects:
## Name Model
## mySpatial SPDE2 model
## mySpatial_Brier Copy
## mySpatial_Quoddy Copy
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Range for mySpatial 270.488 0.862 269.065 270.477 271.906 269.251
## Stdev for mySpatial 0.618 0.000 0.617 0.618 0.618 0.618
##
## Expected number of effective parameters(stdev): 23.46(0.00)
## Number of equivalent replicates : 567.52
##
## Deviance Information Criterion (DIC) ...............: 2642.32
## Deviance Information Criterion (DIC, saturated) ....: NaN
## Effective number of parameters .....................: 10.56
##
## Watanabe-Akaike information criterion (WAIC) ...: 1.59e+24
## Effective number of parameters .................: 7.96e+23
##
## Marginal log-Likelihood: -1461.09
## Posterior marginals for the linear predictor and
## the fitted values are computed
Now, predict the fitted detection function from fit_combined4 and compare it with the detection function estimated from fit_combined3. Combine the two into a single plot using multiplot. HINT: You may need to use the generate and sapply method if an error message is returned when using predict.
distdf <- data.frame(distance = seq(0, 2, length = 100))
dfun_fc3 <- generate(fit_combined3, n.samples = 20)
dfun_fc3 <- sapply(dfun_fc3, FUN = function(x){exp(log_hn(distdf$distance, as.numeric(x$lsig)))})
dfun_fc4 <- generate(fit_combined4, n.samples = 20)
dfun_fc4 <- sapply(dfun_fc4, FUN = function(x){exp(log_hr(distdf$distance, as.numeric(x$lsig)))})
dfun_df <- data.frame(distance=distdf$distance,
mean_hnorm=apply(dfun_fc3,1,mean),
mean_hazard=apply(dfun_fc4,1,mean),
LCL_hnorm=apply(dfun_fc3,1,quantile,probs=0.025),
UCL_hnorm=apply(dfun_fc3,1,quantile,probs=0.975),
LCL_hazard=apply(dfun_fc4,1,quantile,probs=0.025),
UCL_hazard=apply(dfun_fc4,1,quantile,probs=0.975)
)
multiplot(
ggplot(dfun_df,aes(x=distance, y=mean_hnorm, ymax=UCL_hnorm, ymin=LCL_hnorm)) +
geom_line() + geom_ribbon(alpha=0.4, colour='red', fill='red') +
ylab('Probability of detection half normal'),
ggplot(dfun_df,aes(x=distance, y=mean_hazard, ymax=UCL_hazard, ymin=LCL_hazard)) +
geom_line() + geom_ribbon(alpha=0.4, colour='blue', fill='blue') +
ylab('Probability of detection hazard (shape 5)'),
layout=matrix(1:2, nrow=2, ncol=1, byrow = T)
)
Plot a map of the estimated search effort half-normal functions from both ports. What do you notice? Is this realistic? HINT: To get the half-normal functions we want to investigate the quantity exp(Dist_Brier_sq*Brier_par) and exp(Dist_Quoddy_sq*Quoddy_par.
Dist_Brier_sq_plot4 <- pixels(mesh=mesh_land,mask=Domain)
Dist_Brier_sq_plot4 <- predict(fit_combined4,Dist_Brier_sq_plot4,~exp(Dist_Brier_sq*Brier_par),n.samples = 20, seed=seed)
ggplot() + gg(Dist_Brier_sq_plot4[1]) + colsc(Dist_Brier_sq_plot4[1]$mean) +
gg(Sightings_Brier_nodup,colour='green') + gg(Domain) +
ggtitle('Brier Relative Effort Intensity')
Dist_Quoddy_sq_plot4 <- pixels(mesh=mesh_land,mask=Domain)
Dist_Quoddy_sq_plot4 <- predict(fit_combined4,Dist_Quoddy_sq_plot4,~exp(Dist_Quoddy_sq*Quoddy_par),n.samples = 20, seed=seed)
ggplot() + gg(Dist_Quoddy_sq_plot4[1]) + colsc(Dist_Quoddy_sq_plot4[1]$mean) +
gg(Sightings_Quoddy_nodup,colour='green') + gg(Domain) +
ggtitle('Quoddy Relative Effort Intensity')
# The model is predicting that the whale watch vessels departing from Quoddy are spreading uniformly across the entire domain and visiting each point with almost constant intensity! This is not realistic and shows that the model is unsuitable
Additional opportunistic sightings are available in the SpatialPointsDataFrame object Sightings_Opp_sp. Obtain AIS vessel density data, or manually create a rough one by hand. Try to incorporate these additional sightings into a model. What changes?
It is important to note that the instability seen in some of the models used in these workshops can be atttributed to the small sizes of the datasets used in these workshops. We are fitting pretty complex Bayesian models to point pattern datasets with only a limited number of events.
In practice, when this general framework is applied to much larger datasets, these numerical instabilities will disappear. More complex terms such as nonlinear covariate terms, covariate interactions, temporal and seasonal effects, and spatio-temporal terms can all be considered and may substantially improve model fit. For completeness, we briefly demonstrate the code required to fit spatio-temporal models next.
inlabru allows for kronecker product separable spaitio-temporal models to be easily defined. First, we would need to define integration points manually using the ipoints function.
ips <- ipoints(Effort_survey, mesh_land, group = "YEAR")
Next we would need to modify the mySpatial term within the components as follows:
cmp <- LHSstuff ~ RHSstuff + mySpatial(main = coordinates, model = matern_land,
group = YEAR, ngroup = 3,
control.group(model='?'))
where '?' can be iid for independent spatial fields each year ar1 for an autoregressive separable spatio-temporal field see ?control.group for possible models
We cannot fit a spatio-temporal model to our data due to insufficient data and due to the survey tracklines visiting non-overlapping regions (non-randomly) each year!
These workshops would not have been possible without the extensive organisational and data-preparation efforts of Shelley Lang and Catalina Gomez. I thank you both! Additionally, I would like to thank Marie Auger-Méthé for all her efforts with reviewing and reformatting these sessions. Finally, I would like to thank Finn Lindgren for making a stable release version of inlabru in time for these workshops and for providing coding assistance.